-
Notifications
You must be signed in to change notification settings - Fork 676
feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable HTTP completion endpoint to accept arrays of prompts and generate multiple completions per prompt #3953
Conversation
Enable completion endpoint to accept arrays of prompts and generate n completions per prompt, matching vLLM behavior. - Add utility functions to handle prompt arrays (get_prompt_batch_size, extract_single_prompt) - Implement batch processing in HTTP handler with proper choice index remapping - Add validation for total choices (batch_size × n ≤ 128) - Generate unique request_id for each prompt to avoid conflicts - Add comprehensive tests for batch prompts and n parameter combinations - Maintain backward compatibility with single prompt requests Choice index formula matches vLLM: final_index = prompt_idx * n + choice_idx Example: 3 prompts with n=2 yields indices 0,1 (prompt0), 2,3 (prompt1), 4,5 (prompt2)
Signed-off-by: zhongdaor <[email protected]>
…eature-parity-testingllama-33
Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
WalkthroughThe pull request implements batch-aware handling for LLM completions by introducing detection logic that routes single-prompt and multi-prompt requests through dedicated code paths. Batch utilities extract and validate prompts, enforce a total choices limit, and support per-prompt choice remapping with streaming and annotation handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: zhongdaor <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
|
updated examples/test plan in description |
|
Syncing with main to pull in this change to hopefully fix all the failing deploy tests: https://github.com/ai-dynamo/dynamo/pull/4089/files |
…eature-parity-testingllama-33
|
May need this one for deploy test failures: #4130 |
ryan-lempka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see a test for empty prompt array and make sure it's properly rejected. Otherwise LGTM!
Signed-off-by: zhongdaor <[email protected]>
…eature-parity-testingllama-33
Signed-off-by: zhongdaor <[email protected]>
…eature-parity-testingllama-33
ryan-lempka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: zhongdaor <[email protected]>
…eature-parity-testingllama-33
Overview:
This PR enables the completion endpoint to accept arrays of prompts and generate multiple completions per prompt.
Details:
Where should the reviewer start?
lib/llm/src/protocols/openai/completions.rs - contains the new validation logic and utility functions
lib/llm/src/http/service/openai.rs - contains the batch processing implementation with choice index remapping
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Test Plan
curl localhost:8000/v1/completions -H "Content-Type: application/json" -d '{ "model": "Qwen/Qwen3-0.6B", "prompt": ["Say test 1", "Say test 2"], "max_tokens": 50,"temperature": 0.7,"n": 1}' | jq% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 617 100 504 100 113 1446 324 --:--:-- --:--:-- --:--:-- 1772 { "id": "cmpl-342716fb-9fbe-42bf-b874-48dd150c6bba-1", "choices": [ { "text": "234, 3214, 4321, 4123, 1234, 2413, 1324, 4312, 413", "index": 0, "finish_reason": "length" }, { "text": "015\nLet $T_{n}$ be the set of all possible expressions of the form $\\frac{a_n}{b_n} + \\frac{c_n}{d_n}$, where $a_n, b_n, c_n", "index": 1, "finish_reason": "length" } ], "created": 1762282615, "model": "Qwen/Qwen3-0.6B", "system_fingerprint": null, "object": "text_completion", "usage": { "prompt_tokens": 4, "completion_tokens": 50, "total_tokens": 54 } }Summary by CodeRabbit
New Features
Bug Fixes